THE LINEAR BOUND IN A 2 FOR CALDERON-ZYGMUND 
OPERATORS: A SURVEY 

MICHAEL LACEY 

Abstract. For an L 2 -bounded Calderon-Zygmund Operator T acting on L 2 (R d ), and a 
weight w £ ^2, the norm of T on L 2 (w) is dominated by Ct ||u> \\a 2 - The recent theorem 
completes a line of investigation initiated by Hunt-Muckenhoupt-Whccdcn in 1973 [HMW73], 
has been established in different levels of generality by a number of authors over the last 
few years. It has a subtle proof, whose full implications will unfold over the next few years. 
This sharp estimate requires that the A<i character of the weight can be exactly once in 
the proof. Accordingly, a large part of the proof uses two-weight techniques, is based on 
novel decomposition methods for operators and weights, and yields new insights into the 
Calderon-Zygmund theory. We survey the proof of this Theorem in this paper. 



1. Introduction 

We survey recent developments on the norm behavior of classical Calderon-Zygmund op- 
erators on weighted spaces, with a special focus on the Muckenhoupt-Wheeden class of 
weights A 2 . Indeed, after the 40 some-odd years since the class of A p weights was introduced 
by Muckenhoupt and Wheeden, the theory has reached a natural milestone, with the sharp 
dependence of norm estimates being established. We concentrate on an exposition of the 
techniques behind this Theorem: 

Theorem 1.1. Let T be an L 2 -bounded Calderon-Zygmund operator acting on L 2 {R d ) (for 
precise definition see Definition 2.1). And, let w G A 2 (for precise definition see Defini- 
tion 3.1). We then have the estimate 

(1.2) \\Tf\\ L 2 {w) <C T \\w\\ A2 \\f\\ LHw) . 

Here < CV < 00 depends only on the operator T and dimension d. 

The theory of weights came of age in 1973, with the result of Hunt- Muckenhoupt- Wheeden 
[HMW73], which showed in one dimension that for w > a.e., the Hilbert transform is 
bounded on L 2 (w) if and only if w G A 2 . This result was established for other suitable 
collections of singular integrals in higher dimensions. But, early proofs of this fact delivered 
a poor control on the norm, and indeed, the significance of the sharp dependence was a theme 
recognized over time. 
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The interest here is that the power of the A 2 characteristic is in general sharp. Accordingly, 
the method of proof is delicate, and indeed sheds new light on methods and techniques 
appropriate for weighted spaces, as well as the structure of Calderon-Zygmund operators. 

It is known that the estimate (1.2), together with sharp extrapolation [DGPP05], gives 
the sharp estimate on L p (w), for 1 < p < oo, accordingly, we concentrate on the L 2 case. 
We recall definitions in the next two sections, and then recall different elements of one of 
the proofs known of this paper, the pleasingly direct proof of Hytonen-Perez-Treil-Volberg 
[HPTV]. The concluding section includes some historical remarks, and a variety of pointers 
to cognate results and approaches. 

Acknowledgment. Due to my, and my father's, personal connection to Polish mathematicians, 
it was my distinct pleasure to participate in the conference marking the centenary of birth 
of Jozef Marcinkiewicz. It was a fitting testament to the life of Marcinkiewicz, of what was 
accomplished, what was lost, and finally, what the people of Poznan and Poland can now 
achieve, in their beautiful and prosperous city and country. 

2. Calderon-Zygmund Operators 

There are two canonical examples of Calderon-Zygmund operators that one can keep in 
mind. The first is Hilbert transform itself, defined by 

Hf(x) := lim f f(x-y)—. 

Here, one should note that if / is Schwartz class, then the limit above exists for all x, and is 
referred to as the principal value of the integral. In brief, Hf(x) = pv /* -. But, the Hilbert 
transform is a convolution operator, which introduces a subtle simplification in its analysis 
in Lebesgue space. (There is no paraproduct to control.) Aside from the Hilbert transform, 
the other canonical convolution operators are the Beurling in the plane, and the vector of 
Riesz transforms. 

A second example to keep in mind, one that motivated much of the development of the 
Theory in the 1980s, is the Calderon Commutator defined as follows. For Lipschitz function 
A on R, let 

Note that we have Ca = [Ma, 4:]H, where Ma is the operation of multiplication by A. We 
require A to be Lipschitz, as then we have, in some average sense, A ^Z^^ — ( x ~ v)~ l - And 
the deep fact is that we have || || 2^2 ^ ll^lkip- But, this is not at all easy to prove! . 
A general definition of Calderon-Zygmund operators we will consider. 

Calderon-Zygmund Operators 2.1. Let < 5 < 1 and Let K(x,y) : M. d x M d \{(x, x) : 
lei" 1 } — y M. satisfy kernel estimates 

(2.2) \K{x,y)\<C T \x-y\- d -\ x^yeR d , 
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\K(x,y)-K(x',y)\ + \K(y,x) - K(y,x')\ < C 7 



\x — x'\ s 

\d+8 



x - y\ 

with the second condition holding provided \x — x'\ < \\x — y\. Here, < Ct < oo. 
Occasionally, K (x, y) will be referred to as a Calderon-Zygmund kernel. 
Consider a linear operator T : L 2 — > L 2 such that 

T f( x ) = J K {%, y)f(y) dy, x £ supp /, 

for a fixed kernel K(x, y). 

We then say that T is Calderon-Zygmund Operator, and write T G CZO,5 and 

H^HcZOa- : = \\T\\L 2 (dx)^L 2 {dx) +C T < OO. 

One should note that in one dimension, that the kernel K(x, y) = t^zh is a Calderon- 
Zygmund kernel, though the corresponding operator is not bounded. As well, it is hardly 
clear that the Calderon Commutator is a bounded operator. Thus, it is a natural question to 
find a simple characterization of the Calderon-Zygmund Operators. This class of operators 
was characterized by David and Journe [DJ84], in the famous Tl Theorem. 

Theorem 2.3 (Tl Theorem). An operator T with Calderon-Zygmund kernel, is L 2 -bounded 
if and only if for T > 0, we have the two uniform estimates over all cubes I (ZM. d . 

(2.4) /|T X/ | 2 ^<T 2 |/|, 



(2.5) J \T*xifdx<Ty\. 

In the second line, T* is the adjoint ofT, namely it has the kernel K(y,x) 



The import of this result is that the full L 2 -inequality already follows from the boundedness 
of the operator on a very small set of functions, namely the indicators of cubes. We should 
note that this is not the formulation of the Theorem as in [DJ84], but the version found in 
[Ste93, Chapter V]. Clearly, we prefer the form above, over its more familiar formulation, as 
it does not require the supplemental space BMO. We refer to the two conditions (2.4) and 
(2.5) as Sawyer testing conditions, as their use in characterizing the bounded of operators first 
appeared in his two-weight Theorems on the Maximal Function [Saw82] and the Fractional 
Integrals [Saw88]. 

Let us use this Theorem to see that the Calderon Commutator is bounded. Let us take 
the interval / = [a, b], and then using integration by parts, 

f b A(x) - A(y) ^ 
C A {Xi){x) = I —jz — dy 



(x - y)'- 

A(x) - A(b) A(x) - A(a) f b A'(y) 



(x — b) (x — a) J a x — y 



dy 
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The first two terms are bounded by ||v4||L ip , and the third is the Hilbert transform applied 
to A'x( a ,b) £ L°°. Hence the testing condition for Ca follows from the L 2 -boundedness of the 
Hilbert transform. 



3. The A p Weights 



The A p weights have the definition 

Definition 3.1. For w an a. e. positive function (a weight) on M. d , we define the A p charac- 
teristic of w to be 



\w\\a p '■= sup I / 1 j w dx 



i V(p-i) dx f 



1 < p < oo 



where the supremum is over all cubes in M. d . In the case of p = 1, we set 
\\Mw u 

MUi := 



w MUU 

We note that ||w||a p is not a norm, but continue to use the familiar notation. Below, we 
will also write w(I) = JjWdx for the (non- negative) measure with density w. It is a critical 
property, one that is key to the many beautiful properties of the A p theory, that we have 
w > a.e. In particular, this means that unambiguously defined. Also, 

note that we have wa p ~ l = 1, which casts the definition of A p is a clear light: It requires 
that this pointwise equality continue to hold in an average sense, uniformly over all locations 
and scales. 

We will refer to a as the dual measure. This language is justified by a useful observation 
from [Saw82]. The inequalities below are all equivalent for a linear operator T: 

(3.2) \\T(af)\\ LP(w) <C\\Tf\\ LP{a)l 

r*M)n^ ((T) <ciiT/ii iy(U)) . 

To pass from the first line to the second, use the change of variables / H- a ■ f '. There 
is a routine calculation, which is based on the basic identity of the weighted theory that 
pip' — 1) = 1. And, note that the last line is the formal dual inequality to the second. Thus, 
the inequality (3.2) expresses duality in a natural way: Interchange the roles of w and a, and 
replace p by dual index p' . 

Of course we are primarily interested in the case of p = 2. Two examples of A 2 weights to 
keep in mind, in dimension 1, are as follows. First, for an arbitrary measurable set £cl, and 
N > 0, the weight is w = N\e + Xr-e- As long as \E\ > 0, one has || w || ^ 2 < max(N, 1/iV). 
Indeed, we can assume N > 1. For an interval / we have 

< N. 



w(I) a 


(I) ^ N\Eni 


+ 


E c ni\ 




I 




I 









This shows that an A 2 weight need not have any smoothness associated with it. 
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A second example is the borderline case of w(x) = \x\. This is not an A 2 weight as the 
dual measure cr(x) = is not locally integrable. But if we mollify the zero, setting for 

< a < 1, w a (x) = \x\ a , then we have ||w a |U 2 — (1 — a) -1 . It is for such examples that one 
can verify that H-ff^^) ~ (1 — a)^ 1 . (Test on X[o,i]-) But, these examples are somewhat 
misleading, in that the simple behavior of their zeros is not at all indicative of intricacy of 
the general A 2 measure. 

We comment on classical Theorem of Muckenhoupt [Muc72a] concerning the A p weights 
and the Maximal Function, defined by 

Mf(x) := sup(2i) _1 f \f{x-u)\dt. 

t>o J[~t,t] d 

Theorem 3.3. For w > 0, we have the following equivalences: 

(1) w G A p ; 

(2) M is bounded as a map from L p (w) to L p,oc (w); 

(3) M is bounded as a map from L p (w) to L p,oc (w). 

Note that the weak and strong type norms are equivalent. 

Clearly, the strong-type inequality implies the weak-type. Using the formulation (3.2), and 
applying the maximal function to a the indicator of a cube directly proves that w G A p . So, 
the content of the result is that the A p property implies the strong type inequality. Here, 
the fact that w > a.e. is decisive, and the shortest-six lines-proof of this is due to Lerner 
[Ler08]. Nevertheless, it seems confusing that the weak and strong types should be equivalent. 
The sharp dependence of the Maximal Function on the A p characteristic is helpful here. For 
w G A p , how does the norm depend upon ||w||a p ? Buckley [Buc93] studied the question and 
proved 

Theorem 3.4. For w G A p , we have 

\\M\\ L p( w )^ LP ,oo( w ) < IMI^J, 
(3.5) \\M\\ LP{w) ^ LP{w) < IMlX 

Thus, the norm dependence is rather different. This is a basic set of inequalities, due to 
the notion of Rubio de Francia extrapolation [RdF84]. 

A final, critical property for us is the so-called Aoo-property. It states that an A p weight 
cannot be too concentrated in any one cube. Indeed, as we will illustrate in the context of 
the Maximal Function, this is the single property of A p weights that can be used to prove 
sharp results, and it can only be used once. 

Lemma 3.6. Let w G A p , I is a cube and E C I. We then have 



,3.7) ffl< W !*[?4fl 

|/| nAp lw{I) J 



1/p 
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Proof. The property that w > a.e. allows us to write 

\E\ _ f E wV p {x)w{x)-V p dx 



< 



wjEy/pqiiy/p' 

w{E)^/pw{I) l / p a{I) l l p ' 



w(I) 

which proves the Lemma. 



□ 



4. Dyadic Grids 

Combinatorial arguments, stopping time arguments or decompositions of functions and 
operators, will frequently be done with the help of dyadic grids. In this section, we collect a 
number of elementary facts that we will need from time to time. At different moments, the 
methods and constructions of this section will in fac be decisive for us. 

By a grid we mean a collection Q of cubes in M. d with In I' G {0, /, I'} for all /, V G X. The 
cubes can be taken to be a product of clopen intervals, although the behavior of functions or 
weights on on the boundary of cubes in a grid will never be a concern for us. If G, G' G Q, 
with G' the smallest element of Q that strictly contains G, we refer to G' as the Q -parent of 
G, and G' is a (?-child of G' . Let Childg(G") denote the collection of all ^-children of G' . If 
the grid is understood, the Q is suppressed in the notation. 

We will say that Q is a dyadic grid if each cube I E Q these two properties hold. (1) 
/ is the union of 2 d -subcubes of equal volume (the children of /), and (2) the set of cubes 
{I 1 EG : \r\ = \I\} partition R d . 

Associated to any dyadic grid T> are the usual conditional expectations and martingale 
differences are given by 

(4.1) Ejf-.^xi-lll^ffdx, Ajf:= J] E P f -Ejf . 

And, we also set 

E k f:= E '/' A *f : = E A '/" 

lev lev 

£(I)=2- k £(I)=2~ k 

Then, by the Martingale Convergence Theorem, for / G L l (dx), E k f — > f a.e. And, by the 
Muckenhoupt Theorem for the Maximal Function, for w G A 2 , and / G L 2 (w), the same 
conclusion holds. 
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4.1. Proof of Buckley's Maximal Function Inequality. As an illustration of the use 
the condition, let us return to Buckley's estimate (3.5), and prove it in the dyadic case. 
Namely, for choice of dyadic grid T> in M. d , we define the associated Maximal Function 

Mf(x) :=supx/(s)Ej|/| 
lev 

where here we have introduced the notation Ej0 := jj(f). Also, we are continuing with 
the same notation for the Maximal Function, suppressing its dependence on the choice of 
grid. For this operator, we will prove (3.5). 
We make the definition of the stopping cubes. 

Definition 4.2. Let Q be a grid, a a weight. Given cube / G Q, we set the stopping children 
of Iq, written C(I), to be the maximal dyadic cubes I' C I for which Kj>a > 4E/cr. A basic 
property of this collection is that 

(4.3) ^|/'|<||/|. 

/'eC(J) 

We set the stopping cubes of I to be the collection S(I) = [jj >0 <Sj(I), where we inductively 
define Sq(I) := {/}, and Sj+i(I) = U/'es m ^(-0- Thus, these are the maximal dyadic cubes, 
so that passing from parent to child in S, the average value of a is increasing by at least 
factor 4. 

Proof of (3.5). It is the fundamental Theorem of Eric Sawyer [Saw82] that for the Maximal 
Function, we have a powerful variant of the David Journe Tl Theorem. Namely, for any 
pairs of weights (w, a), we have the equivalence between these two inequalities 

WMiafiW^KdWfW^, 



(4.4) J^M(a X i) p w dx < C%a(I) , I 6 V . 



Moreover, letting C\ and C*2 be the optimal constants in these two inequalities, we have 
C\ ~ C2. Notice that this shows that the Maximal Function bound reduces to a testing 
condition. 

And so, in the special case that w G A p , and a = w l ~ p , we estimate the constant C2. 

M{a X ifw{dx)<\\w\\%a{I). 

We do so by passing to the stopping cubes S(I), and estimating as below, where we will use 
some common manipulations in the A p theory. 



Jm(*xi) p w(dx) < J\[Y, j^- ■ Xs] P w(dx) 



ses{i) 



(4-5) < £ 



ses(i) 
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(4.6) < 

(4-7) < 
(4.8) 

This proves our estimate. Here, we have taken these steps. 

(4.5) : Pointwise, the sum Y^ses^i) TsT ' Xs( x ) is super-geometric, so comparable to its 
maximal term in the summand. This allows us to move the pth power inside the sum. 

(4.6) : We are using the definition of A p here. 

(4.7) : The property is decisive. By (4.3) and (3.7), we have, using the notation for 
the stopping children from Definition 4.2, ^j'ecm °"(-0 — ~~ cIMIa /VCO; permit- 
ting us to sum a geometric series to get this estimate. 

(4.8) : By inspection, \\a\\ Ap , = IMIa' 1 - 

□ 



S£S(I) 

w\\a p \W\\a p ,v(I) 



4.2. Random Dyadic Grids, Good and Bad Cubes. We are used to thinking of a dyadic 
grid as being canonical, namely the cubes 

V := {2 k (n + [0, l) d ) : k G Z , n G Z d } . 

This choice has a strong edge effect, it for instance distinguishes the origin, in that it is the 
vertex of infinitely many cubes. This sort of anomaly on the other hand should be typically 
rare. Quantifying this is achieved by a random grid. To present one typical example, if Q is 
a dyadic grid, and the interval [0, l) d is in Q, it has one 2 d possible parents, found by taking 
the cube to be the product of one of the two intervals [0,2) and [—1, 1) in each coordinate 
separately. To randomize Q, these possible choices of grids should be equally likely. 
For any /3 = {fy} G (3 := {{0, l} d } Z , and cube 7, set 

(4.9) 1+13 = 1+ A 2 "'' 

Ki(I) 

where £(I) := is the side length of the cube. Then, define the dyadic grid Dg to be 

the collection of cubes = {I+f3 : / G V}. This parametrization of dyadic grids appears 
explicitly in [Hyt08], and implicitly in [NTV03, section 9.1]. 

Place the uniform probability measure P on the space (3. Namely, the probability that any 
coordinate (3j takes any one value in {0, l} d is 2~ d , and the coordinates [3j are independent 
of one another. 

Let us see how the randomization affects the edge effect mentioned above. Let < 7 < 1 
be a fixed parameter, and r G Z + is a fixed integer. We say that say that a pair of intervals 
(I, J) G T>p are good if the smaller interval, say 7, satisfies 2 r £(I) < £(J), and 

dist(/,0j) > £{iy 'e(j) 1 -^ . 
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And an interval / is said to be good, if for all intervals J with £(J) > 2 r £(I), we have that 
the pair (/, J) is good. Otherwise, we say that the cube is bad. 

An important property of goodness is the independence of the location or scale of a cube 
/ and its goodness. Take I+f3 G Vp. The spatial position of / is given by the formula (4.9), 
which only depends upon j3j for 2~ J < £(I). And, for a larger cube J, the position of J can 
be written as 

j+ Yl 2 ^-+ E 

3 : - ■'• > : < j : £(I)<2-J <£(J) 

And hence, the position of J relative to I depends only on the coordinates 0j for £(I) < 
2~'-> < £{J), and hence is independent of the location of /. 

As a consequence the probability of a given cube is bad is independent of the location 
or scale of /. Denoting this probability by 7r r . )7 , it is an elementary exercise to see that 
7r rj7 < 2~ 7 " y . As it will turn out, it will be sufficient to have this probability less than one, for 
a choice of 7 that depends upon the Calderon-Zygmund Operator T, and can be taken to be 
a small multiple of the constant 5 in the Definition 2.1. 

4.3. Haar Shifts, Dyadic Calderon-Zygmund Operators. In one dimension, the Mar- 
tingale Difference in (4.1) is given by the rank-one projection Ajf = (f, hj) ■ hi where hj is 
the Haar function, given by hi := (— Xi- + X/+)l-^l~ 1 ^ 2 ) where I± denotes the two children of 
/. And then, the simplest possible dyadic Calderon-Zygmund operator would be a martingale 
transform 

1 

The amenability of these operators to issues of measurability, and stopping time arguments 
has long been exploited, leading to a remarkable set of properties that are known for these 
objects. 

Below, we will say that martingale transforms have complexity 1. To motivate this up- 
coming definition, let us recall the remarkable result of Stephanie Petermichl, concerning the 
Hilbert transform. In one dimension, consider the dual to the classical Haar function given 
by gi = (—hi_ + hj + )/\/2, and the special operator given by 

Uf = U p f := J2(f,gi)-hj. 

The Hilbert transform can be recovered from the operators Up, namely the result below holds. 
Theorem 4.10. Let Dibj/^x) = fix/ 8). For non-zero constant c, we have 

DiW^Dib/5 — = cH 

Here, the expectation is taken over (3 G (3. 
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The Hilbert transform is distinguished by different properties, including being L 2 -bounded, 
translation and dilation invariant, and (formally) satisfying if (cos) = c-sin. By inspection, Up 
is L 2 -bounded. The averaging procedure above provides translation invariance, and dilation 
invariance, as we have used the Haar measure for the dilation group in the average. For the 
last property, note that gi is a localized cosine, while h is a localized sinus. We refer the 
reader to [PetOO, Hyt08] for a precise proof of this Theorem. 

The import of this result is that in situations where there is a translational and dilational 
invariance, one can prove results about the Hilbert transform by considering the much simpler 
operators U — where tail behavior is no longer an issue. Similar representations are available 
for other distinguished convolution kernels. For instance, the Beurling operator [DV03] can 
be recovered from martingale transforms, while the Riesz transforms are closer to the Hilbert 
transform [PTV02]. The most general result known in this direction is [Vag09], which shows 
that all smooth, odd one dimensional Calderon-Zygmund kernels can be obtained by a variant 
of Stephanie Petermichl's method. 

A more general definition is as follows. In higher dimensions, we mention that the mar- 
tingale differences are finite rank projections, but there is no canonical choice of the Haar 
functions in this case. Below, by Haar function we will a function hi, supported on I, con- 
stant on its children, and orthogonal to Xi (and no assumption on normalizations). And, 
by a generalized Haar function as a function hj which is a linear combination of Xi, an d 
{Xr '■ I' £ Child(/)}. Such a function supported on / but need not be orthogonal to 
constants. 

Definition 4.11. For integers (m,n) e Z 2 , , we say that linear operator S is a (generalized) 
Haar shift operator of parameters (m,n) if 



where (1) in the second sum, the superscript < - m ' n ' ) on the sum means that in addition we 



require l(I') = 2 m £(I) and i(J') = 2 n £(I), and (2) the function hj, is a (generalized) Haar 



lev J 1 

where sj(x, y) is supported on / x /, with L°° norm at most one. We say that the complexity 
of 5* is max(m, n). 

These are dyadic variants of Calderon-Zygmund operators. Note in particular that (4.12) 
is analogous to (2.2), while the 'smoothness' criteria is replaced by the parameters (m,n). 
Consider a Haar shift operator. It is an L 2 -bounded operator, in particular its norm is at 




ievi'j'ev 




In particular, this means that we have the representation 
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most one. The situation for generalized shifts is far more subtle, and here, we should single 
out the following definition, for it distinguished role in the theory, though not necessarily this 
paper. We call an operator S a paraproduct if it is a generalized Haar shift of parameters 
(0, 1) or (1, 0). To be specific, it, or its dual, is of the form 



where hi is a Haar function. A fundamental fact here is the following special case of the Tl 
Theorem, in the dyadic case. 

Theorem 4.15. Let S be as in (4.14). Then, S is L 2 -bounded if and only if we have 



This is a particular variant of the famous Carleson Embedding Theorem, and the main 
step in extending the David Journe Tl Theorem to the dyadic setting. 

More generally, we have the following quantitative form of the Dyadic Tl Theorem. 

Theorem 4.16. Let S be a generalized Haar shift operator of complexity /i. Then S extends 
to a bounded operator on L 2 (W i ) if and only if we have, uniformly over cubes I, 



Moreover, we have 1 1 *S* 1 1 < £*S + /x 2 . 

There are two points to make here. The first is that there is a weak dependence of the 
norm of the operator as a function of the complexity /1. The second, is the familiar, but not 
mentioned to this point, feature of the Calderon-Zygmund theory, that thee operators have 
strong features. If S is a bounded operator, then, the sum in (4.13) is unconditional in /. The 
import of this feature, important for proof of the main Theorem, is that decompositions of 
dyadic cubes lead immediately to decompositions of operators. In the second, an L 2 -bounded 
Calderon-Zygmund operator is necessarily bounded on many other spaces. Of particular 
interest for us is the endpoint estimate for L x : 

Theorem 4.17. Let S be a dyadic shift operator of complexity fi, which is bounded on L 2 (M. d ). 
Then, we have the estimate 



(4.14) Sf = J2 E if- h i 



lev 



l|Sx/l| 2 <|/| 1/2 . 




(4.18) sup \\{Sf > X}\ < {(1 + ||S|| 2 ^ 2 ) 2 + »}\\f\\i • 



A>0 



This is a well-known principle, but the weak- dependence on the complexity is a point 
observed by Hytonen. See [HPTV, Theorem 5.2]. 
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4.4. A Weighted Version of the Dyadic Tl Theorem. A crucial step is to prove a 
weighted version of the Tl Theorem, one that holds for general weights. To emphasize this 
point, for a pair of weights (w,o~), which are not necessarily related, we set the two weight 
A 2 condition to be 

II I, 

\\w,o-\\ A2 :=sup — — — . 

l£V Ml |J| 

We have this variant of the Tl Theorem, for generalized Haar shift operators, in the weighted 
setting. 

Theorem 4.19. Let S be a generalized Haar shift operator of complexity fi, and (w, a) a pair 
of weights. We have \\S(af)\\ L 2^ < C\\f\\ a , where 

C < d fiS + fi 2 \\w,a\\]l^ 
J^S(axi) 2 w{dx) < SV(J), 

J^S*{w X if o-(dx) < S 2 w(I), 

Here, we are considering the weighted inequality in its natural form, see (3.2). And we 
are bounding the weighted norm of the Haar shift in terms of the two-weight A 2 condition, 
as well as the testing condition. Of particular importance for the proof of the linear bound 
is the very weak dependence of the constants on the A 2 characteristic. For the proof, see 
[NTV08] and for the quantitative estimate above [HPTV, Theorem 3.4]. In particular, the 
proof is a weighted variant of the usual proof of the dyadic Tl Theorem, with an important 
point being that one should use weighted Haar functions to give the proof. 

5. The Random BCR Algorithm 

A proof of the Tl Theorem must, implicitly, or explicitly, decompose the Calderon- 
Zygmund operator into appropriate components. In the language of random dyadic shifts, 
the remarkable result of [HPTV, Theorem 4.1] is 

Theorem 5.1. Let T be a C alder on- Zygmund Operator T with smoothness parameter 5. 
Then, we can write 

(5.2) T = CEp 2 ^ m+n)S/2s L,n 

where (a) the expectation is taken over the space of random dyadic grid; S m ^ n is a (random) 
dyadic shift; (c) the shifts of parameters (0,1) and (1,0) are generalized shifts; (d) all other 
shifts need not be generalized; (e) the constant C is a function of T , and the smoothness 
parameter 5. In particular, we will have, uniformly over the probability space, 



2(^2 



< 1. 
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The focus with Theorem 4.10 is noteworthy. The prior result obtains the Hilbert transform 
as a convex combination of Haar shifts of bounded complexity. The Theorem above obtains 
it as a sum of Haar shifts, but one that is rapidly converging in complexity. 

In the dyadic setting, similar results were proved by Figiel [Fig90], and independently 
by [BCR91], with the latter article being broadly influential. The method of expanding 
Calderon-Zygmund operators using this method reveals subtle approximation theory prop- 
erties of these operators. This method is not random, but has the disadvantage of using 
operators which are not purely dyadic. 

Indeed, the Theorem above looks wrong. Using standard Haar basis in one dimension, the 
inner product (Hh[ 0j i], /i[o,2 fe )) does not have the good decay properties in terms of complexity 
claimed above. Instead, one needs a concept like the goodness property of §4.2. And indeed, 
this is the main point, the inner product (Hhj, hj) will be small, if the pair of intervals (/, J) 
are good. 



In the prior proofs of the linear bound for operators, one used the averaging technique of 
Petermichl, see Theorem 4.10, to represent the Calderon-Zygmund operator as an average of 
Haar shifts of bounded complexity. And then, verified the linear bound for such shifts. But, 
the representation (5.2) gives one another option. For an A 2 weight, and an arbitrary Haar 
shift operator S, verify the linear bound, with only moderate growth in the complexity fi of 
the Haar shift. Here, we can allow any polynomial dependence on the complexity. We have 
already described this in two different places, the first is the dyadic two-weight Tl Theorem, 
Theorem 4.19, and the second is the weak-L 1 inequality, (4.18). 

The relevant result is [HPTV, Equation (5.5)]. 

Theorem 6.1. Let S be a generalized Haar shift operator of complexity fi, and S = ||S||2>_>2- 
For w G A 2 and a = w~ x , and cube I , we have 



The method of proof here, aside from the dependence on the complexity, is derived from 
[LPR10], and is a subtle extension of the method used to prove (3.5), the sharp dependence 
on the A p characteristic for the Maximal Function. Indeed, the interested reader should first 
consult [LPR10], which does not seek to track dependence of the bound on the constants. 
This argument uses the stopping cubes, as given in Definition 4.2. And, this decomposition 
is then used to decompose the operator. Then, the main step is to identifies a notable 
extension of the John-Nirenberg inequalities that holds in the two-weight setting, for the 
decomposed operator. With this, we conclude our discussion of the proof of the linear bound 
for Calderon-Zygmund operators. 



6. The Corona and the Linear Bound 
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7. History 

7.1. The weighted theory came of age with the paper [HMW73] of Hunt-Muckenhoupt- 
Wheeden, showing that for non-negative weight w, the Hilbert transform is bounded on 
L 2 (w) if and only if w G A 2 . Still, early proofs combined properties of the weight, including 
the Aoo property we have used, with the Reverse Holder inequality, and the good-lambda 
technique, to deliver estimates for the norm of the Hilbert transform of the order of ||w|Ia 2 - 
These and the other comments about history reflect the authors' knowledge, but as he was not 
a participant in the development of the subject, they will certainly be incomplete. Apologies 
for omissions and gaps are extended in advance. 

7.2. The rapid development of the A p theory in the 1970's lent some credence to the thought 
that similar variants of the A p condition could be used to characterize the two- weight inequal- 
ities as well. The characterization for the Hardy operator [Muc72b] confirmed this. It was a 
surprise when Sawyer [Saw82] showed that such conditions cannot be used for the Maximal 
Function, instead one must use the testing conditions in (4.4). (For a little more detail, 
consult the counterexample discussed in Sawyer's paper.) 

7.3. In the two-weight setting, the Hardy operator is somehow the easiest to study, the Max- 
imal Function is the next step harder, then the fractional integrals, and finally the singular 
integrals. It took several years for the proof of the two-weight inequalities for the fractional 
integrals to be characterized. Sawyer gave the characterization in the Tl language in [Saw88]. 
This was contemporaneous with the David- Journe Tl Theorem, but the connection was not 
widely appreciated until much later, especially by the work of Nazarov-Treil-Volberg. For 
history on this last point, see [Vol03]. 

7.4. In the two-weight setting, one can have the fractional integral operators mapping LP 
into L p , indeed this is the hard case. In the case of LP being mapped into L q , for q > p, there 
is a second characterization due to [GK89], also see [GGKK98, Chapter 3], and [SW92]. This 
characterization can be used to prove the sharp A pq bound for the fractional integrals on M. d , 
see [LMPT10]. 

7.5. The paper of Sawyer- Wheeden [SW92] extends the two-weight inequality for the frac- 
tional integrals to homogeneous spaces; this is an interesting direction, which has been, and 
will be, explored in many different directions. 

7.6. The question of the sharp dependence of the norm estimates of different operators, in 
terms of the A p characteristic was specifically raised by Buckley [Buc93], where the estimate 
(3.5) was proved. These bounds for the Maximal Function, together with the Rubio de 
Francia extrapolation technique leads to an important simplification of the analysis of many 
of the weighted inequalities. Namely, as is demonstrated in [DGPP05], identifying a sharp 
exponent in A p characteristic for a single distinguished choice of p can prove the entire range 
of inequalities. For the Calderon-Zygmund operators, this index is p = 2. 

7.7. In a different direction, Fefferman and Pipher [FP97] recognized the interest of this 
question, for singular integrals, with the weights w £ A 1 . Wittwer [WitOO] proved the linear 
bound for A 2 weights, for martingale transforms. Petermichl and Volberg [PV02] showed the 
same for the Beurling operator, proving a conjecture of Astala on quasi-conformal maps as 
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a consequence. Much later, a certain two-weight inequality for the Beurling operator was 
proved [LSUT08] as a crucial step in proving another conjecture of Astala. These examples 
motivate in part the interest in such questions. Other motivations are derived from con- 
siderations in spectral theory [KT07], operator theory [NV02], and orthogonal polynomials 
[NPVY09]. 

7.8. It was an important breakthrough when Stefanie Petermichl proved the linear bound for 
the Hilbert transform [Pet 07]. This technique was based on the one hand, the representation 
of the Hilbert transform as an operator of complexity one, and on the other on the Bellman 
function method. The latter, deep, technique could require substantive modification if the 
Haar shift changes; these modifications were spelled out for the Riesz transforms in [Pet08], 
and dyadic paraproducts [Bez08]. 

7.9. An inequality used in some of these developments was the so-called bilinear embed- 
ding inequality of Nazarov-Treil-Volberg, [NTV99]. The latter is a deep extension of the 
(weighted) Carleson embedding inequality to a two-weight setting. This inequality can also 
be interpreted in the language of fractional integrals, and the Sawyer method can be used 
to prove it, and extend it to other LP settings [LSUT09], as well as vector- valued settings 
[SculO]. 

7.10. Andrei Lerner [Ler09] devised a remarkable inequality, giving pointwise control of a 
function in terms of a sum of local oscillations. This inequality can be used to provide 
equally remarkable proofs of the sharp A p inequalities for dyadic Calderon-Zygmund oper- 
ators [CUMPlOa, CUMPlOb], even in certain vector-valued situations. As of yet, it is not 
understood how to use this method on continuous Calderon-Zygmund operators. 

7.11. Commutators of the form [T, Mb] are of interest, for instance, the Calderon Commu- 
tator can be written in this form. And the paper of Chung- Pereyra-Perez [CPP10] gives a 
complete discussion of this question in the setting of A p weights. The two-weight variants 
appear to be largely open. 

7.12. Lerner conjectured that the Littlewood-Paley Square function would have a different 
behavior in terms of its A p characteristic. Namely, the case of p = 3 was the critical index, 
and the power on the ^-characteristic was 1/2. He used his 'local oscillation' inequality, as 
well as other considerations, to prove this inequality in full generality [LcrlO]. 

7.13. The paper [LPR10] proved the A 2 linear bound for all Haar shifts, using a Corona de- 
composition that has been useful to the complete resolution of the Conjecture. The technique 
is obtaining a natural Corona decomposition in order to verify the testing conditions. This 
paper gave a rather poor dependence in terms of the complexity of the Haar shift parameter, 
but the role of complexity was only brought to the fore in [HytlO]. 

7.14. Perez- Treil-Volberg used the full strength of the non-homogeneous Harmonic analysis, 
and in particular the innovative paper [NTV04], to prove a remarkable extension of the Tl 
Theorem to the A 2 setting. Loosely, an operator T with a Calderon-Zygmund kernel, then 
T extends to a bounded operator on L 2 (w), w G A 2 , if and only if the testing conditions of 
Theorem 4.19 hold. Then, it was shown [HLR + 10] that the linear bound holds for Calderon- 
Zygmund operators with sufficiently smooth kernels. This proof used the Belykin-Coifman- 
Roklin [BCR91] decomposition, and the method of [LPR10] to verify the testing conditions. 
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A short time later, Hytonen [HytlO], used a random variant of the Belykin- Coif man- Roklin 
method to give a proof of the linear bound for arbitrary smoothness, again using the A 2 Tl 
Theorem of [NTV04]. This proof of the full conjecture was then streamlined in [HPTV], 
giving the line of argument we have followed in this survey. 

7.15. Lerner has conjectured that the weak- type bound on Calderon-Zygmund operators 
should obey the linear bound in A p for all 1 < p < oo. This has been verified for dyadic 
Calderon-Zygmund operators, without careful attention to behavior of the exponents in terms 
of complexity [HLRV09], and for the smooth case, with enough derivatives, in [HLR + 10]. The 
principal technique is again derived from [LPR10], as well as a (simple) testing condition for 
the weak- type inequality for singular integrals given in [LSUT08], also see [LSUT09]. Indeed, 
this argument proves the linear bound in A p for the maximal truncations of singular integrals, 
as this is the kind of operator that we have the testing conditions for. It seems likely that 
this conjecture would follow from Theorem 5.1, if one tracks complexity constants. 

7.16. The endpoint case of these estimates is also of interest, namely, for p — 1. It is an 
elementary consequence of a covering lemma argument that for an arbitrary weight w, the 
Maximal Function M maps L l (Mw) into L 1,00 (V). It was then the subject of conjecture if 
the same inequality holds for singular integrals. This was disproved for Haar multipliers by 
Maria Reguera [ReglO], and then for the Hilbert transform by Reguera-Thiele [RT10]. 

7.17. With the failure of the most optimistic form of the conjecture above, one can then 
ask if its natural variant for w G A\ holds. Namely, does the Hilbert transform map L l (w) 
into L l, °°(w) for w G A±, with norm estimate dominated by a constant times Htu^? This 
also fails in the dyadic case [NRVV10]. On the other hand, the Hilbert transform does map 
L}{w) into L 1,00 (w), and the best known upper bound on the norm is H^Hai l°g+IIHUi- See 
[LOP09] for more information on these last two points. 

7.18. A interesting part of the linear bound in A 2 is that one needs a substantive portion 
of two-weight theory to address it. This is Theorem 4.19 above. The general two-weight 
question is a rather intricate one, with a full discussion carrying us beyond the scope of this 
text. The interested reader should consult [V6103] for a general introduction, and the more 
recent papers [LSUT09, LSUT08, LSUT10, NTV04]. 
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